A Flexible Pragmatics-driven Language Generator for Animated Agents 
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Abstract 

This paper describes the NECA MNLG; 
a fully implemented Multimodal Natu- 
ral Language Generation module. The 
MNLG is deployed as part of the NECA 
system which generates dialogues be- 
tween animated agents. The genera- 
tion module supports the seamless inte- 
gration of full grammar rules, templates 
and canned text. The generator takes in- 
put which allows for the specification of 
syntactic, semantic and pragmatic con- 
straints on the output. 

1 Introduction 

This paper introduces the NECA MNLG; a Multi- 
modal Natural Language Generator It has been 
developed in the context of the NECA system.' 
The NECA system generates dialogue scripts for 
animated characters. A first demonstrator in the 
car sales domain (eShowroom) has been imple- 
mented. It allows a user to browse a database of 
cars, select a car, select two characters and their 
attributes, and subsequently view an automatically 
generated film of a dialogue about the selected car. 
The demonstrator takes the following input: 

• A database with facts about the selected car (maximum 
speed, horse power, etc.). 

• A database which correlates facts with value judge- 
ments. 



'neca stands for 'Net Environment for Embodied Emo- 
tional Conversational Agents' and is an EU-IST project. 



• Information about the characters: 1. Personality traits 
such as extroversion and agreeableness. 2. Personal 
preferences concerning cars (e.g., a preference for safe 
cars). 3. Role of the character (seller or customer). 

This input is processed in a pipeline that consists 
of the following modules in this order: 

• A Dialogue Planner, which produces an abstract 
description of the dialogue (the dialogue plan). 

• A MULTI-MODAL NATURAL LANGUAGE GENERA- 
TOR which specifies linguistic and non-linguistic real- 
izations for the dialogue acts in the dialogue plan. 

• A Speech Synthesis Module, which adds infor- 
mation for Speech. 

• A Gesture Assignment Module, which controls 
the temporal coordination of gestures and speech. 

• A PLAYER, which plays the animated characters and 
the corresponding speech sound files. 

Each step in the pipeline adds more concrete in- 
formation to the dialogue plan/script until finally 
a player can render it. A single XML compliant 
representation language, called RRL, has been de- 
veloped for representing the Dialogue Script at its 
various stages of completion (Piwek et al., 2002). 

In this paper, we describe the requirements for 
the NECA MNLG, how these have been translated 
into design solutions and finally some of aspects 
of the implementation. 

2 Requirements 

The requirements in this section derive primarly 
from the use case of the NECA system. We do, 
however, try to indicate in what respects these re- 
quirements transcend this specific application and 
are desirable for generation systems in general. 



Requirement 1: The linguistic resources of the gen- 
erator should support seamless integration of canned text, 
templates and full grammar rules. 

In the NECA system, the dialogue planner creates 
a dialogue plan consisting of (1) a description 
of the participants, (2) a characterization of the 
common ground at the outset of the dialogue in 
terms of Discourse Representation Theory (Kamp 
and Reyle, 1993) and (3) a set of dialogue acts 
and their temporal ordering. For each dialogue 
act, the type, speaker, set of addressees, semantic 
content, what it is a reaction to (i.e., its rhetorical 
relation to other dialogue acts), and emotions 
of the speaker can be specified. The amount of 
information which the dialogue planner actually 
provides for each of these attributes varies, 
however, per dialogue act: for some dialogue acts, 
a full semantic content can be provided -in the 
form of a Discourse Representation Structure- 
whereas for other acts, no semantic content is 
available at all. Typically, the dialogue planner 
can provide detailed semantics for utterances 
whose content is covered by the domain model 
(e.g., the car domain) whereas this is not possible 
for utterances which play an important role in the 
conversation but are not part of the domain model 
(e.g., greetings). This state of affairs is shared 
with most real-world appUcations. 

Since generation by grammar rules is primarily 
driven by the input semantics, for certain dialogue 
acts full grammar rules cannot be used. These 
dialogue acts may be primarily characterized in 
terms of their, possibly domain specific, dialogue 
act type (greeting, refusal, etc.). Thus, we need 
a generator which can cope with both types of 
input, and map them to the appropriate output. 
Input with little or no semantic content can typ- 
ically be dealt with through templates or canned 
text, whereas input with fully specified semantic 
content can be dealt with through proper grammar 
rules. Summarizing, we need a generator which 
can cope with (hnguistic) resources that contain 
an arbritary combination of grammar rules, 
templates and canned text. 

Requirement 2: The generator should allow for 
combinations of different types of constraints on its the out- 



put, such as syntactic, semantic and pragmatic constraints 

In the NECA project the aim is to generate 
behaviour for animated agents which simulates 
affective situated face-to-face conversational 
interaction. This means that the utterances of the 
agents have to be adapted not only to the content 
of the information which is exchanged but also to 
many other properties of the interlocutors, such as 
their emotional state, gender, cultural background, 
etc. The generator should therefore allow for such 
parameters to be part of its input. 

Requirement 3: The generator should be sufficiently fast 
to be of use in real-world applications 

The application in which our generator is 
being used is currently fielded as part of a net- 
environment. The appUcation will be evaluated 
with users through online questionnaires which 
are integrated in the application and analysis of 
log files (to answer questions such as 'Do users 
try different settings of the appUcation?', etc. See 
Krenn et al., 2002). Therefore, the generator will 
have to be fast in order for it not to negatively 
affect the user experience of the system. 

3 Design Solutions 

The NECA MNLG adopts the conventional pipeline 
architecture for generators (Reiter and Dale, 
2000). Its input is a RRL dialogue plan. This 
is parsed and internally represented as a Profit 
typed feature structure (Erbach, 1995). Subse- 
quently, the dialogue acts in the plan are realized 
in accordance with their temporal order. For each 
act, first a deep syntactic structure is generated. 
The deep structure of referring expressions is dealt 
with in a separate module, which takes the com- 
mon ground of the interlocutors into account. Sub- 
sequently, lexical realization (agreement, inflec- 
tion) and punctuation is performed. Finally, turn- 
taking gestures are added and the output is mapped 
back into the rrl xml format. 

Here let us concentrate on our approach to the 
generation of deep syntactic structure and how it 
satisfies the first two requirements. The input to 
the MNLG is a node (i.e., feature structure) stipu- 
lating the syntactic type of the output (e.g., sen- 



tence: <s), semantics and further information on 
the current dialogue act in Profit:-^ 

(<s & 

sem ! drs ( [ c_27 ] , 

[type ( c_2 7 , prestigious) , 
argl (c_2 7, x_l) ] ) & 
currentAct ! speaker ! 
(name ! john & 
polite ! yes & . . . ) 

) 

Thus various types of information are combined 
within one input node. Generation consists of tak- 
ing the input node and using it to create a tree 
representation of the output. For this purpose, 
the MNLG tries to match the input node with the 
mother node of one of the trees in its tree repos- 
itory. This tree repository contains trees which 
can represent proper grammar rules, templates and 
canned text. Matching trees might in turn have in- 
complete daughter nodes. These are recursively 
expanded by matching them with the trees in the 
repository, until all daughters are complete. 

A daughter node is complete if it is lexically 
realized (i.e., the attribute form of the node has 
a value) or it is of the type <np and the seman- 
tics is an open variable. In the latter instance, the 
node is expanded in a separate step by the refer- 
ring expressions generation module. This module 
finds the discourse referent in the common ground 
which binds the open variable and constructs a de- 
scription of the object in question. The descrip- 
tion is composed of the properties which the ob- 
ject has according to the common ground, but can 
also be empty if the object is highly salient. The 
module is based on the work of Krahmer and The- 
une (2002). The (empty) description is mapped 
to a deep syntactic structure using the tree repos- 
itory. Lexicalization subsequently yields expres- 
sions such as 'it' (empty descriptive content) or, 
for instance, 'the red car' . 

Let us return to the tree repository and illus- 
trate how templates and rules can be represented 
uniformly. The representation of a tree is of the 

^That is, Prolog with some sugaring for the rep- 
resentation of feature structures. Feature structures are 
also used in the FUF/SURGE generator It is different 
from the NECA MNLG in that it takes as input thematic 
trees with content words. Furthermore, it allows for con- 
trol annotations in the grammar and uses a special inter- 
preter for unification, rather than directly PROLOG. See 
jhttp : / / www ■ cs ■ bgu . ac . il/ surge/. 



form (Node, [Treel, Tree2, ...]), where 
the list of trees can be empty, yielding a tree con- 
sisting of one node: (Node, [ ] ) . The following 
is a template for dialogue acts of type greeting 
with no semantic content and a polite speaker. 

(<s & 

currentAct ! 

(type ! greeting & 
speaker ! polite ! "yes " & 
speaker ! name ! Speaker ) & 
sem ! "none " , 

[ (<s & form! "hello! ", [] ) , 
(<fragment & 
form ! "My name is " , [ ] ) , 
(<np & 

sem! concept (Speaker) , [ ] ) 
] ) . 

This is a template for the text 'Hello! My name is 
Speaker'. Where Speaker is a variable which 
is bound to the name of the speaker of the utter- 
ance. The noun phrase (<np) for this name is gen- 
erated by the referring expression generation mod- 
ule. The following is a tree representing a gram- 
mar rule of the familiar type S —>■ NP VP: 

(<s & 

currentAct ! type ! statement & 
currentAct ! CA & 
argGap ! ArgGap & 
auxGap ! AuxGap & 
sem!drs(_, [negation ( 
drs (_, 

[type (E, Type) 
argl (E,X) |R] ) ) ] 
) , 

[ (<np & 

currentAct ! CA & 
sem!X, [] ) , 
(<vp & 

argGap ! ArgGap & 
auxGap ! AuxGap & 
negated ! <true & 
sem! drs (_, [type (E, Type) I R] ) & 
currentAct ! CA, _) 
] ) . 

Note that this rule applies to an input node whose 
semantic content contains a negation. The nega- 
tion is passed on to the VP subtree via the feature 
negated. The attributes argGap and auxGap 
allow us to capture unbounded dependencies via 
feature perlocation. Our use of trees is related to 
the Tree Adjoining Grammar approach to genera- 
tion (e.g., Stone and Doran, 1997).^ 

^Their generation algorithm is, however, very different 
from the one proposed here. Whereas they propose an in- 
tegrated planning approach, we advocate a very modular sys- 



The value of the attribute currentAct is 
passed on from the mother node to the daughter 
nodes. Thus any pragmatic information (personal- 
ity, politeness, emotion, etc.) is passed on through 
the tree and can be accessed at a later stage, for 
instance, when lexical items are selected. 

4 Implementation 

The NECA MNLG has been implemented in PRO- 
LOG. The output is in the form of an RRL XML 
document. Table 1 provides a sample of the re- 
sponse times of the compiled code running on a 
Pentium III Mobile 1200 Mhz with Sicstus 3.8.5 
PROLOG. We timed the complete generation pro- 
cess from parsing the XML input to producing 
XML output, including generation of deep syn- 
tactic structure, referring expressions, turn taking 
gestures (not discussed in this paper), etc. 



input 


# acts 


= 1 


< 10 


A 


19 


0.230s 


0.741s 


B 


22 


0.290s 


0.872s 


C 


23 


0.290s 


0.801s 


D 


31 


0.431s 


1.372s 



Table 1: Response Times of the mnlg 

The results show generation times for entire di- 
alogues and according to whether the generator 
was asked to produce exactly one solution or se- 
lect at random a solution from a set of at most ten 
generated solutions (the latter strategy was imple- 
mented to obtain more variation in the generator 
output). On average for = 1 the generation time 
for an individual dialogue act is almost ^ of a 
second. For < 10 it is of a second. The 
generator uses a repository of 138 trees (includ- 
ing the two examples given above). The repos- 
itory has been developed for and integrated into 
the eShowroom system which is currently be- 
ing fielded. A start is being made with porting the 
MNLG to a new domain and documentation is be- 
ing created to allow our project partners to carry 
out this task. We hope that our efforts will con- 
tribute to addressing a challenge expressed in (Re- 

tem, supporting fast generation. Moreover, by using features 
for unbounded dependencies we do not require the adjunction 
operation, which is incompatible with our topdown genera- 
tion approach. We follow Nicolov et al. (1996), who also use 
TAG, in their commitment to flat semantics. Their generator 
does, however, not take pragmatic constraints into account. 



iter, 1999): "We hope that future systems such as 
STOP will be able to make more use of deep tech- 
niques, because of advances in linguistics and the 
development of reusable wide-coverage NLG com- 
ponents that are robust, well-documented and well 
engineered as software artifacts." 

In our view the best way to approach this goal 
is by providing a framework which allows for the 
flexible integration of shallow and deep genera- 
tion, thus making it possible that in the course of 
various projects, deep analyses can be developed 
alongside the shallow solutions which are diffi- 
cult to avoid altogether in software development 
projects, due to the pressure to deliver a complete 
system within a certain span of time. 
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